A Fractal Approach for Selecting an Appropriate Bin Size for Cell-Based Diversity Estimation

نویسندگان

  • Dimitris K. Agrafiotis
  • Dmitrii N. Rassokhin
چکیده

A novel approach for selecting an appropriate bin size for cell-based diversity assessment is presented. The method measures the sensitivity of the diversity index as a function of grid resolution, using a box-counting algorithm that is reminiscent of those used in fractal analysis. It is shown that the relative variance of the diversity score (sum of squared cell occupancies) of several commonly used molecular descriptor sets exhibits a bell-shaped distribution, whose exact characteristics depend on the distribution of the data set, the number of points considered, and the dimensionality of the feature space. The peak of this distribution represents the optimal bin size for a given data set and sample size. Although box counting can be performed in an algorithmically efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

کارآیی هشت مدل ریاضی در توصیف اندازه‌ ذرات در برخی خاک‏های استان چهارمحال و بختیاری

Selecting an appropriate particle size distribution (PSD) model for a particular soil may be important for a precise estimation of soil hydraulic properties. Various models have been proposed for describing soil PSDs. The objective of this study was to compare the quality of fitting of eight PSD models (Fredlund, Gompertz, van Genuchten, Jaki, Logarithmic, Exponential, Logarithmic-Exponential a...

متن کامل

Assessment of protected vs. degraded oak forests: A geostatistical approach based on soil and plant diversity

Assessment of forest soil and vegetation characteristics provides basic and essential information for the protection and rehabilitation measures in forest ecosystems. Therefore, regard to the importance of this issue, the distribution of different soil properties and vegetation diversity in relation to conservation management and degradation investigated in the oak forests of Ilam province usin...

متن کامل

The effect of estimation methods on fractal modeling for anomalies’ detection in the Irankuh area, Central Iran

This study aims to recognize effect of Ordinary Kriging (OK) and Inverse Distance Weighted (IDW) estimation methods for separation of geochemical anomalies based on soil samples using Concentration-Area (C-A) fractal model in Irankuh area, central Iran. Variograms and anisotropic ellipsoid were generated for the Pb and Zn distribution. Thresholds values from the C-A log-log plots based on the e...

متن کامل

DASTWAR: a tool for completeness estimation in magnitude-size plane

Today, great observatories around the world, devote a substantial amount of observing time to sky surveys. The resulted images are inputs of source finder modules. These modules search for the target objects and provide us with source catalogues. We sought to quantify the ability of detection tools in recovering faint galaxies regularly encountered in deep surveys. Our approach was based on com...

متن کامل

Resources classification using fractal modelling in Eastern Kahang Cu-Mo porphyry deposit, Central Iran

Resources/reserves classification is crucial for block model creation utilised in mine planning and feasibility study. Selection of estimation methods is an essential part of mineral exploration and mining activities. In other word, resources classification is an issue for mining companies, investors, financial institutions and authorities, but it remains subject to some confusion. The aim of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and computer sciences

دوره 42 1  شماره 

صفحات  -

تاریخ انتشار 2002